-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TE with threading build #1092
TE with threading build #1092
Conversation
Signed-off-by: Phuong Nguyen <[email protected]>
Signed-off-by: Phuong Nguyen <[email protected]>
Signed-off-by: Phuong Nguyen <[email protected]>
I updated the default value for threading to 1 and added a message for parallel build info. |
Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Phuong Nguyen <[email protected]>
Signed-off-by: Phuong Nguyen <[email protected]>
Signed-off-by: Phuong Nguyen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -68,7 +68,7 @@ def setup_pytorch_extension( | |||
print("Could not determine CUDA Toolkit version") | |||
else: | |||
if version >= (11, 2): | |||
nvcc_flags.extend(["--threads", "4"]) | |||
nvcc_flags.extend(["--threads", os.getenv("NVTE_BUILD_THREADS_PER_JOB", "1")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phu0ngng could you document here the reason for switching from 4 to 1 as a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @timmoon10 mentioned, users getting stuck in the build process (#976, #978, #1077).
Besides, the PyTorch Ext build with --thread 4
broke on my desktop. I discussed with Tim and we think it is better to have 1 as default so that all users have fewer build issues.
For "advanced" users, they are encouraged to play with NVTE_BUILD_THREADS_PER_JOB
to best match their system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* added threading build back * integrating threading for pytorch and paddle extensions * added messages --------- Signed-off-by: Phuong Nguyen <[email protected]> Co-authored-by: Tim Moon <[email protected]>
Description
In #987, TE build with parallel jobs was added, while threading for fused_attn build was removed. With #1048, the build time of the ext can be measured and compared.
It turns out that using max parallel jobs together with setting
--threads 4
halves the build time. However, it adds some potential oversubscribes that may break the build on NGC. SoNVTE_BUILD_THREADS_PER_JOB
env var was added to provide an option to disable this threading build.By default,
NVTE_BUILD_THREADS_PER_JOB=1
to avoid potential build issues on small systems.Users are encouraged to play with the
NVTE_BUILD_THREADS_PER_JOB
value based on their system.NVTE_MAX_BUILD_JOBS
was renamed toNVTE_BUILD_MAX_JOBS
for consistency.Type of change
Checklist: